-
Notifications
You must be signed in to change notification settings - Fork 3
Rust metrics #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Rust metrics #17
Conversation
Signed-off-by: Andrew Steurer <[email protected]>
Closes #8 (probably) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran this locally and it worked. Very exciting!! I think we have quite a bit of iteration to do before this is ready. My first concern is getting the WIT looking right then the Spin and opentelemetry-wasi implementations will fall out of that.
Still loading all this stuff back into my head after a long time away so this is just a preliminary bout of comments and questions to get us started.
It might speed up this whole process if you could in a paragraph or two provide some color on how you ended up with the WIT that you have right now so I can have some more context.
Good work. Excited to see all this.
My process for building this came from looking closely at the Let me know if you need more context. |
In case there are concerns about me referencing an unstable API for metrics, looks like the folks in the |
Spent some time reading the spec 1, looking at what you have, and pondering. I think it would be useful to back up and look at this from a 10,000 foot view. 10,000 foot viewAt the highest level we have a component that has a bunch of metrics that need to make it out of the component (typically into the host runtime, but maybe a parent component in the case of composition). There's two ways we can do this: push or pull. Push means the guest decides when to send it's metrics to the host. For example this is how wasi-otel traces works. Pull means that the host decides when to get metrics from a guest. PullPull means that the host decides when to get metrics from a host. Some pseudo-WIT might look like:
The guest would provide an implementation of this On the opentelemetry-wasi side of things (ref) we would likely model it as something like I imagine a lot could be said on why there are pros and cons to the pull architecture, but the best I have right now is that my gut is telling me we should explore push first. I imagine in the future we may need to support both push and pull for different use cases. PushPush means the guest decides when to send it's metrics to the host. Some pseudo-WIT might look like:
The most naive way we could model this in opentelemetry-wasi is by saying that each instrument will immediately export the data when it is called therefore bypassing the aggregation that occurs within the opentelemetry SDK for metrics normally. This is potentially simpler, but I don't think we should do this because a common reason metrics is used is for performance critical parts of code where you don't want to be jumping between the host and guest all the time. A less naive way to model this would be to have something like In our pre-wasi-p3 world though it is hard to build a
I don't know which is the right choice. It's worth noting that in either pattern we would basically telling the user to make sure they run some version of The world is complicatedHere's a couple of unordered things that are complicate the design space:
Looking at your current implementationLet's interpret your current implementation through the lens I've laid out so far. It seems like you've modelled it off of My understanding of the necessity of What do I think we should doWe don't have all the info to make a perfect decision, but here is what I think we should do.
Footnotes
|
97a1dd7
to
6435fa9
Compare
Edit: No longer relevant |
Signed-off-by: Andrew Steurer <[email protected]>
6435fa9
to
bf8425f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not a full review, but going to stop here b/c I don't want to get into the nits until we address this core stuff
/// `collect` gathers all metric data related to a Reader from the SDK | ||
collect: func(metrics: resource-metrics) -> result<_, otel-error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to be called export
now.
variant otel-error { | ||
already-shutdown, | ||
timeout(duration), | ||
internal-failure(string), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical that we're actually observing these set of errors with our setup. Are you sure this is the exact set of errors that are possible and that we want to bake into the wit? Or should we just have the error be a string for now?
/// The WASI representation of the `OTelSdkError`. | ||
/// | ||
/// See https://github.com/open-telemetry/opentelemetry-rust/blob/353bbb0d80fc35a26a00b4f4fed0dcaed23e5523/opentelemetry-sdk/src/error.rs#L15 | ||
variant otel-error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Redundant to say something is a WASI X in a WIT file.
Nit: Don't love us linking to the Rust SDK as opposed to some canonical OTel reference spec page.
/// Aggregated metrics data from an instrument. | ||
variant aggregated-metrics { | ||
/// All metric data with `f64` value type. | ||
%f64(metric-data), | ||
/// All metric data with `u64` value type. | ||
%u64(metric-data), | ||
/// All metric data with `s64` value type. | ||
%s64(metric-data), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why this variant is necessary.
} | ||
} | ||
|
||
impl MetricReader for WasiMetricReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe stick the comment I suggested about why we have a manual reader here to explain all these simple impls 🤷 ?
fn temporality(&self, kind: InstrumentKind) -> Temporality { | ||
match kind { | ||
InstrumentKind::ObservableCounter | ||
| InstrumentKind::ObservableGauge | ||
| InstrumentKind::ObservableUpDownCounter => { | ||
panic!("Async InstrumentKinds are not yet supported"); | ||
} | ||
_ => self.reader.temporality(kind), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't they supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of observable instruments is that they periodically export in the background. Using a ManualReader
to export observable instruments would mean that metrics would be sent all at once, effectively making them non-observable counters. I think it would be confusing for people to be able to use observable counters and not have them work how they expect.
@@ -0,0 +1,17 @@ | |||
[package] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the tracing and metrics examples living in the same example is nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure! Part of me likes the clear separation of examples; however, I can see how keeping everything together might make for a more interesting example. I'm open to either.
When you get a chance, will you review and let me know if this is good to merge?